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Abstract — Many storage channels admit reading and rewriting 
of the content at a given cost. We consider rewritable channels 
with a hidden state which models the unknown characteristics 
of the memory cell. In addition to mitigating the effect of the 
write noise, rewrites can help the write controller obtain a better 
estimate of the hidden state. The paper has two contributions. 
The first is a lower bound on the capacity of a general rewritable 
channel with hidden state. The lower bound is obtained using 
a coding scheme that combines Gelfand-Pinsker coding with 
superposition coding. The rewritable AWGN channel is discussed 
as an example. The second contribution is a simple coding scheme 
for a rewritable channel where the write noise and hidden state 
are both uniformly distributed. It is shown that this scheme is 
asymptotically optimal as the number of rewrites gets large. 

I. Introduction 

In nonvolatile memory technologies, the write mechanism 
is commonly impaired by write noise due to which the value 
written on a cell is different from the one intended. An 
important feature of many of these technologies such as Flash 
Q, Phase Change Memory (3) and Resistive RAM (4), (5) is 
that they allow rewriting, i.e. the value written on a memory 
cell can be read and rewritten if necessary. Rewrites can 
increase the storage capacity but are costly since they are 
time consuming and degrade the memory. Hence there is a 
fundamental trade-off between the number of writes and the 
amount of information that can be stored in a memory cell. 

Given a memory array of n cells, the goal is to maximize 
the number of distinct messages that can be reliably encoded 
in the array, subject to a constraint on the average or maximum 
number of writes per cell. The cells are assumed to be 
statistically independent. Rewritable channels were introduced 
in |6| and subsequently studied in ||7|-p0|] under an average 
cost constraint. Maximum cost constrained rewritable channels 
were considered in ]TT|-[ 14 1. 

In practice, a memory cell is an amalgam of physical 
components which reacts to inputs in some way that designers 
hope to model as well as possible. However, there are always 
some unknown characteristics of the cell due, for example, 
to fabrication variability. These characteristics, which may be 
too costly to learn, introduce an extra degree of uncertainty 
into the value written on the cell. In this paper, we model 
this effect with the channel Py\x,s where X is the input 
stimulus, S is a hidden (unknown) state parameter of the cell 
and Y is the value stored in the cell. S is assumed to be have 
known distribution Pg. The alphabets of X, Y, S are denoted 
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by X , y, S, respectively. We consider two canonical examples 
of rewritable channels in this paper: 

1) Uniform noise channel with state: The channel model is 

Y = X + W + S. (1) 

X e [0, 1] is the input stimulus, the write noise W is 
uniformly distributed in [—a/2, a/2], and the state S is 
uniformly distributed in [0, B]. a,B are known positive 
constants. The basic version of this channel (S = 0) 
was introduced in (6) as a simple model that captures 
some essential features of non-volatile memories such 
as analog inputs and bounded write noise. 

2) AWGN channel with state: The channel is again de- 
scribed by the additive model ([TJ. The write noise and 
the state are Gaussian random variables: W is distributed 
as 7V(0, N) and S is distributed as W(0, c^)QThe input 
is constrained in terms of the either the average or peak 
power per write. 

A key feature of the state S is that it stays fixed across write 
attempts in each cell. Conditioned on S = s, the value stored 
in the cell after each attempt depends only on the most recent 
input stimulus - it is determined according to Py\xs(-\X, S = 
s), where X is the current input. In the additive model Q, 
this means that each write attempt on a cell is affected with 
an independent realization of the random variable W, while 
S stays fixed across write attempts. 

In this paper, we consider rewritable channels with a con- 
straint on the the average number of writes per cell. Given a 
constraint k on the average write cost, the goal is to determine 
the capacity C(k) and design coding schemes to achieve rates 
close to C(k). 

We consider the following class of coding schemes. To write 
on cell i, the write controller applies stimuli X^',X^\ . . . 
until the output falls within a target region T i7 where is 

(k) 

a subset of the output space. The fcth write stimulus X\ 
can depend on the outputs of the previous stimuli, denoted 
Y i , . . . , Y± , Formally, a rewrite code of rate R over an 
array of n cells is defined by: 

• An encoder mapping which maps a message in 
{l,...,2 nR } to a sequence ({X 1 ,T 1 ),...,(X n ,T n )) l 
where Ti is the target region for cell i, and Xi = 
(X\ f , Xj- 2 ' , . . .) is the input strategy for cell i. 

• A decoder which maps the output sequence (Yi, . . . Y n ) 
to {1,...,2™ R }. 

l N([i, a 2 ) is the Gaussian distribution with mean fi and variance a 2 . 



For cell i, the number of writes needed for the output to 
fall within region Tj is a random variable, denoted rj(X,, Tj), 
where Xi = (Jf j , , . . .) is the input strategy. The 
average write-cost of the code is i SiLi ^ r i(^Qi Ti). Due 
to the statistical independence of the cells, the capacity for an 
average cost constraint k is (6J, (9) 

C{k)= sup J(XT;Y). (2) 

X,T:Et(X,T)<k 

The capacity formula in |2]l is not easy to compute in 
general. This is because the optimization is over adaptive 
strategies where each input stimulus can depend on the 
outcomes of the previous stimuli. Adaptive strategies are 
particularly useful in channels with hidden state because we 
get a better estimate of the state with each write, which can 
be used to generate the next stimulus^] For intuition, consider 
two extreme cases: 

• When k = 1, we are allowed only one write attempt and 
the hidden state is treated as an additional noise variable. 

• When the average cost constraint k — >• oo, we can 
spend a number of write attempts to get a very good 
estimate of the state, and use the remaining writes to store 
information by designing the input stimulus to nullify the 
effect of the state. Thus we expect the storage rate to 
approach the no-state capacity when k is very large. 

For 1 < k < oo, the challenge is to simultaneously learn the 
state while attempting to store information at a high rate. 
The main contributions of this paper are as follows. 

1) In Section |IlJ we derive a capacity lower bound for 
continuous-output rewritable channels with state. The 
scheme used to obtain this bound involves state estima- 
tion phase followed by a coding phase. The writing strat- 
egy in the coding phase combines two techniques from 
multi-user information theory: Gelfand-Pinsker coding 
[15), (16) and superposition coding Q7). The AWGN 
rewritable channel is discussed as an example. 

2) In Section [Hi] we focus on the uniform noise channel 
and present a coding scheme that is computationally 
simple and amenable to practical implementation. The 
scheme implicitly combines state estimation and coding, 
and is shown to be asymptotically optimal as the number 
of rewrites gets large. 

The rewritable channel considered in this paper is a stylized 
model relevant to technologies like Phase Change Memory 
and Resistive RAM which have analog outputs. Both these 
memory technologies are known to be affected by variability 
across devices (3), fT8) , which to the first order can be 
modeled as a hidden state. Though relaxing assumptions such 
as noiseless reads would make the model more realistic, we 
believe that the current model gives useful insights regarding 
how rewrites can be harnessed to improve the storage density 
of these memories. 

2 For memoryless rewritable channels without state, we can restrict the input 
strategies to be non-adaptive, i.e. repeatedly apply the same stimulus to a cell 
until the target region is hit. See |9j, [10| . 



Notation: We use upper-case letters to denote random vari- 
ables and bold-face notation for random vectors. Entropy and 
mutual information are measured in bits, and logarithms are 
with base 2 unless otherwise mentioned. 

II. Lower Bound on the Rewrite Capacity 

For an average write cost k, we design a scheme consisting 
of two phases: an estimation phase of I (< n) writes to learn 
the state 5, and a coding phase requiring an average of n — I 
writes. For the coding phase, we combine two techniques: 1) 
Gelfand-Pinsker coding [15] which achieves the optimal rate 
given the state estimate if we are allowed only a single write 
for the coding phase (i.e., k — I = 1), and 2) Superposition 
coding flT) to store an additional log(K — I) bits/cell when 
K > I + 1. 

Before presenting the general result, we describe the coding 
scheme for the AWGN rewritable channel to highlight the 
main ideas. 

A. The AWGN channel with hidden state 

The channel is defined by ([1} with the write noise W ~ 
Af(0,N) and the state 5 ~ A/"(0,af). We assume that there 
is an average power-constraint P, i.e., in each write attempt 
the average power of the input stimulus across the n cells is 
at most P. 

State Estimation: The first step is to construct an estimate 
of the state of each cell using I writes. Due to the symmetry 
of the channel model, this can be done by applying any input, 
say c, for I writes and recording the outputs . . . ,Y"> 

which are generated according to 

y« = c + wW + s 

; 0) 
r w = c+ww + s 

where W^ 1 ), . . . , W^ 1 ' are independent 7V(0, N) random vari- 
ables. The minimum-mean squared error (MMSE) estimate of 
5 given the observations . . . , is 

5(0 = e[5 | y« . . . , yW] = j^f^ t( yW - c )- W 

s j=l 

Encoding: The write channel for the (I + l)th write can be 
expressed as 

y(l + l) = X (l+1) + §Q + + W (l+1)^ (5) 

The estimate 5(7) is known to the encoder prior to the (I + 
l)th write. Further, (5 — 5(Z)) is independent of S(l) due to 
the orthogonality principle fl9) and the joint Gaussianity of 

(5,5(Z)). 

Let us first consider the case where we use only a single 
write after the estimation period. For write I + 1, |5]) describes 
a channel with state S(l) known to the encoder and effective 
channel noise 5-5(/) + W r < i+1 ) which is independent of S(l). 



The effective channel noise is a Gaussian random variable 
distributed as Af(0, Negj) where 

N ea , t = E[(S-S(l) + W ( - l + 1 ^ 2 } 

= E[(S - S(l)) 2 ]+E[(W {l+1) ) 2 } 



lo* + N 



■N. 



(6) 



The optimal coding scheme for this channel is the 'writing 
on dirty-paper' scheme of Costa fl6) . The key idea is to 
incorporate part of the known state S(l) into the codeword. 
This is done by building a codebook over an auxiliary random 
variable U ~ W(0, P + u 2 o 2 s t ) where 



P 



P + N f 



efU 



a*, = E[S(l) 2 } = 



la 4 . 



(7) 



(8) 



Let the storage rate be R bits/cell. We build a [/-codebook 
with 2 nRl codewords whose elements are generated i.i.d 
according to 7V(0, P + a 2 cr 2 t ). The value of Ri > R will be 
specified below. The codebook is divided into 2 nR bins, with 
each bin containing 2 ni - Rl ~ R: > codewords. Each bin represents 
a message in the set {1, . . . , 2 nR }. To transmit message m, 
the encoder attempts to find a codeword U within bin m such 
that (U — aS(Z)) is nearly orthogonal to S(Z). Formally, the 
encoder finds a codeword U that is jointly typicaQ |20| with 
S(Z) according to the distribution described by 



U = X + aS(l) 



(9) 



where X ~ 7V(0,P) and S(l) ~ Af(0,a 2 sl ) are indepen- 
dent Gaussians. From rate-distortion theory, this step will be 
successful if the number of sequences in each bin 2 n ( Rl ~ R ) 
is larger than 2 nI ( U;S ( l '\ where the mutual information 
computed using the joint distribution described by (|9j. The 
codeword written on the n cells is 



x ( ; +!) =U-aS(Z). 



(10) 



Note that X < -' +1 - ) has average power nearly P, due to ([9}. The 
sequence stored on the cells is 



Y ('+i) = x ( ' +1) + s + w ((+1) . 



(11) 



The decoder's task is to decode the codeword U from 
Y(' +1 ). The corresponding bin index then gives the message. 
If the encoding operation is successful, (U, Y^ +1 ^) are jointly 
typical according to 



U = X + aS(l), 

Y = X + S(l) + (S — S(l) + W), 



(12) 



where X ~ Af(0,P), 3(1) ~ A^O,^ ,), and (S - S(l) + 
W) ~ A/"(0, iVeff,;) are mutually independent random vari- 
ables. If we use a maximum-likelihood or joint typicality 

3 Roughly speaking, sequences (U, S) are jointly typical according to 
distribution P if their empirical joint distribution is close to i.i.d. P. 
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Fig. 1: Target regions for superposition coding. The shaded regions 
together form the target region for one message. 



decoder p0| , the codeword U can be successfully decoded 
if i?i < I(U;Y). 

Combining this with the earlier bound Ri—R > I(U; S(l)), 
we conclude that rates 



R<I(U;Y)-I(U;S(l)) 



(13) 



are achievable. Computing this with the joint distribution 
specified in ( p~2] >, we obtain that any rate 

1, A. P 



n < - log 



i 



(14) 



is achievable. 

If we restrict ourselves to a single write after the estimation 
period, ( fT4| ) gives the optimal rate. This is because even when 
the encoder and decoder both know S(Z) a priori, the maximum 
rate is given by < |14) . (The decoder can simply cancel off 
any effect of S(Z) from the stored value.) When S(Z) is not 
available at the decoder, the Costa coding scheme nullifies its 
effect by incorporating part of it into the codeword U. 

Superposition: When n > Z + l, we have available more than 
one write after the estimation period. We use superposition 
coding to store additional bits using the remaining writes. For 
the sake of intuition, temporarily assume that k is an integer. 

The idea is to partition the output space (R for the AWGN 
channel) into (k — Z) different regions such that the output is 
equally likely to fall in each of these regions regardless of 
the input. This is done in the following way. We divide the 
real line into intervals of length S, and assign to the intervals 
labels 1, . . . , (k—1) in succession, as shown in Figure[T| Target 
region 1 is the union of the intervals marked 1, target region 
2 is the union of the intervals marked 2, and so on. Formally, 
we define the target regions Tj for j = 1 ,...,«; — I as 



Tt = U 



it! 



(( K -l)i+j)S-5, ((k-0* + j)*). 



(15) 



We let 5 —> for reasons explained below. 

For each cell i g {l,...,n}, the additional information 
stored is represented by a message m; drawn uniformly from 
the set e {1, . . . , k — I}. At the end of estimation period, the 
controller uses the state sequence estimate S(Z) to determine 
the codeword U = (Ui, . . . , U n ) of the Costa scheme. It then 
repeatedly applies stimulus X t to cell i until the output falls 
in the target region mj. Recall that 

Xi = Ui - aSi(l). 

In each write attempt, the noise realization is an independent 
realization of a 7V(0, N) random variable. However, the state 
estimation error S — 3(1) remains constant across attempts 
and is unknown to the decoder. Defining each target region as 
a collection of disjoint infinitesimal intervals ensures that the 



output in each write attempt is equally likely to lie in each of 
the (k — I) target regions, regardless of the value of S — S(l). 
Thus the number of writes required to obtain an output in the 
desired region is a geometric random variable with mean 

( K -l). 

The total number of writes (including the estimation period) 
for cell i is denoted r, and the final value stored in cell i is 

(t) 

Y> ■ The discussion above shows that 

E[Ti] = I + k - I = k. (16) 

Decoding: The decoder observes the stored sequence 

Y (r) = (yW yWj 

and attempts to decode the codeword U. The key observation 
is that (U, Y (/+1) ) and (U, Y (r) ) have the same joint distribu- 
tion. This is because for each cell i, the write stimulus and the 
channel state remain the same for writes (1 + 1) through r i7 and 
the noise realizations W^ l+1 \ . . . are i.i.d. Af(0,N). 

Thus (U, Y^ 7 "') is jointly typical according to ( |12) , and the 
codeword U can be reliably decoded if R satisfies ( fl4| ). The 
target region containing the output of cell i directly gives the 
message G {1, ... , k— 1} stored in the superposition phase. 

When k is not an integer, we can vary the number of target 
regions across the cells in order to achieve a write-cost of k. 
For example, if n = 1 + A, for A £ (1, 2), we can code a 
fraction A of the n cells at average cost 2 and the remaining 
n(l — A) cells at average cost 1 to obtain an overall cost of 
K = 1 + A. Thus the straight line joining the value of the lower 
bound at k = 1 and n = 2 is a lower bound for k £ (1, 2). 
In general, the convex hull of the rates achieved at the integer 
points can be achieved through 'cost-sharing' between cells. 

The performance achieved of two-step coding strategy de- 
scribed above is summarized in the following proposition. 

Proposition 1: Consider the channel described by ([T} with 
state S ~ W(0, a 2 s ), noise W ~ W(0, N) and an average 
power constraint P on the input. With average cost n > 1, the 
rewrite capacity satisfies 



C(k) > 



conv 



max 
ie{o,i,...,L«J- 



i} 



P 

Nov 



\0g[K-l\ 



(17) 



where conv denotes the convex hull and 

T 2 



Nau = N 1 + 



la*+N 



Figure |2] shows the capacity lower bound for = 100 and 
1 < k < 10. Curves are plotted for a 2 s = N and a 2 s = lOiV. 
For the second case, the maximum in ( [17] ) is attained with an 
estimation period of I — for k = 1, I = 1 for 2 < k < 9, 
and / = 2 for k = 10. 




Fig. 2: Lower bound of Proposition [I] for ^ = 100. The top curve 
is for a' 1 , — N and the bottom one for a'i — \QN. 



B. Lower bound for General Channels 

We consider channels whose output support is continuous 
valued, i.e., for V(x, s) £ X x S, Py\xs(-\ x ) s ) i s absolutely 
continuous with respect to the Lebesgue measure. This as- 
sumption is necessary because the definition of target regions 



for superposition coding in Section II-A implicitly assumes 
continuous valued outputs. The superposition idea can be 
extended to many discrete channels as well, but we do not 
pursue this here in order to keep the exposition simple. 

For a channel with average write cost k, the two-step 
strategy involves: a) Designing a suitable estimator to estimate 
the state sequence using I writes, and b) storing information 
in the remaining (k — I) writes using Gelfand-Pinsker coding 
and superposition. The Costa coding scheme for the AWGN 
channel is a special case of the Gelfand-Pinsker scheme for 
general memoryless channels with state, with the state known 
a priori at the encoder. 

Theorem 1: Consider a channel Py\xs(-\ x , s ) mat is abso- 
lutely continuous with respect to the Lebesgue measure for all 
(x, s) £ X xS. With average cost k > 1, the capacity satisfies 

C(k) > 

[l(U; Y) -I(U; S(l)) +1o S [k- l\ } 

(18) 

where V is the set of joint distributions of (S, S(l), U, X, Y) 
of the form 



conv l max max 
^e{o,...,[«J-i} V 



Ps-P : 



S(l)\S ' P U\S(l) ' L X=f(U,S{l)) ' r Y\XS- 

Proof: See Appendix. 
Remarks: 

1) In the set of joint distributions V the state distribution Pg 
and the channel law Py\xs are fixed. The maximization 
over V is therefore over the choice of estimator S(l), 
auxiliary distribution Pjjigm, and function / to generate 
the channel input X from (U,S(l)). 



■ 1 



Pv 



2) The MMSE estimator is optimal for the AWGN average- 
power constrained channel, but in general the optimal 
estimator depends on the channel law and the input 
constraints. 

We conclude this section with a brief discussion of the 
shortcomings of the two-step coding scheme discussed in 
this section. First, dedicating I writes to estimating the state 
and then coding is not optimal in general. A scheme that 
simultaneously performs estimation and coding in each write 
attempt is likely to yield higher rates, but such a scheme may 
also be harder to analyze. 

The information is stored in the cell array in coded in 
two ways: through Gelfand-Pinsker coding and superposition 
coding. Each of these poses a different challenge for practical 
implementation. In the Costa/Gelfand-Pinsker scheme we used 
joint typicality or maximum-likelihood decoding, both of 
which are computationally infeasible for a large array of n 
cells. For the AWGN case, there has been has been progress 
towards feasible decoders using structured codebooks such as 
those based on lattices |2T| , (22). 

For superposition coding, we need the reads to be very 
accurate as the width 5 of the intervals is made small (cf. 
Figure [TJ. This is important during encoding (so that the 
controller knows when to stop writing) as well as decoding 
(for the decoder to know which target region the output lies 
in). This problem can be handled by using an outer error- 
correcting code to correct errors that arise due to imperfect 
reads. 

In the following section, we design a coding scheme that 
addresses all the above issues for the uniform-noise channel. 

III. Uniform Noise Channel with Hidden State 

The channel is described by ([TJ with the write noise W 
is uniformly distributed in [—a/2, a/2], and the state S is 
uniformly distributed in [0, B]. a,B are assumed to be known 
positive constants. For ease of analysis, we will assume that 
B < a. 

We present two code constructions, each of which gives a 
lower bound on the rewrite capacity. The first is sub-optimal 
but gives insight into features of good coding strategies. The 
second construction yields a better lower bound which is 
asymptotically optimal, i.e., it is arbitrarily close to the no- 
state capacity for sufficiently large cost constraint. The second 
scheme implicitly performs simultaneous state estimation and 
coding; further, it is computationally simple and robust to 
small inaccuracies in the reading process. 

A. Code Construction 1 

For the uniform noise rewrite channel without state given 
by Y = X + W, the basic coding idea is that with an average 
of k rewrites, we can shrink the effective width of the noise 
interval to a/n. The average-cost capacity was obtained in (9). 
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Fig. 3: Each interval of width a + B is divided into k — 5 target 
regions. The target regions in the interval (—a/2, a/2+B) are shown. 
To write on a region in this interval, the stimulus is applied. The 
dashed lines indicate the part of the interval accessible with S = b. 



rewrite 



Fact 1: |9| For k > n = \^]/ (^), the 
capacity with average cost constraint k is 



When -^^k is an integer, the capacity is achieved by simply 
dividing the space [—a/2,1 + a/2] into equal-sized intervals 
of length a/n and choosing the target region T to be one 
of these intervals with equal probability. The input X is any 
point which maximizes the probability of the output falling 
in the region T. When ^-^k is not an integer, the capacity 
is achieved by a careful generalization of the above idea, 
described in |(9). 

When there is an unknown state offset S 6 [0, B], the idea 
is to define each target region such that there is exactly one 
subset of width a/n that can be accessed with a. fixed input 
and an average of k writes, irrespective of the offset. 

Proposition 2: For the uniform noise rewrite channel with 
hidden state and average cost k > 2, 



C(k) > log K 



1 



B 



B 



(19) 



Proof: The target regions: Refer Figure [3] The output 
space [—a/2, 1 + a/2 + B] is first divided into intervals of 
length (a + B) each. There are TV = [ J such intervals, 

denoted Z i} < i < N - 1. If 1+ a °^ B B is not an integer, the 
remaining output space (N(a + B), 1 + s + B] is discarded. 

For clarity, consider the case where k is an integer. Divide 
each interval Zi into k target regions. The target regions for the 
first interval are defined as shown in Figure [3] Target region 1 
is the interval [-f,-f+fM 

[-f + ^-| + f]u[f 

k target regions are defined for each of the N intervals. 

Encoding: To reach target region t in interval Zi for t g 
{1, . . . , e {0, . . . , N—l}, apply input X = (a+B)i until 
the output falls within region t in interval i. With this input, 
the accessible part of the target region has width exactly a/n 
for any value of S G [0,73]. This is illustrated in the bottom 
part of Figure [3] Regardless of the offset, the probability of 
the output falling within the target region on any write attempt 
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•-]; region 2 is the interval 
— ], and so on. Similarly, 



a/n 



The average number of rewrites is therefore k. 
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Fig. 4: Construction 2: The output space is divided into interior and exterior target regions. The bottom figure shows the an interior target 
region [x, x + D]. The state shifts the input x + D — a/2 to the right by an amount at most B. For all b £ [0, B], the probability of hitting 
the target region in each write attempt is D/a as long as D < a — B. 



The total number of target regions is Nk and by assigning 
them equal probability, the rate is 



I{XT- Y) = I(T; Y) = H(T) = log(iVre) 



(20) 



where the first equality holds because the input X is a function 
of the target region T, and the second equality is due to T 
being uniquely determined by Y. 

The general case where k is not an integer can be handled 
by an extension of the above scheme using the techniques in 

©• ■ 

Remark: When is an integer, ([19} can be written as 



C(k) > log ( « 



1 



log 



1 + B/a 



(21) 



,1 + 5/(1 + t 

The first term above is the capacity when S = 0, or when S 
is precisely known at the encoder. The second term is the loss 
incurred by the coding scheme due the state being unknown. 

In the above construction, we designed the target regions so 
that each one can be accessed with equal probability regardless 
of the value of S. We did not use the rewrites to do any 
state estimation. The sub-optimality of this strategy is seen by 
observing that even when the number of rewrites n is very 
large, the lower bound of Proposition [2] is strictly less than 
the capacity with 5 = 0, given by the first term in (J2T|. The 
next construction remedies this deficiency. 

B. Code Construction 2 

As shown in Figure |4j divide the output space [— a/2, 1 + 
a/2 + B] into two regions: the interval [—a/2 + B. 1 + a/2] 
called the 'interior', and the remaining space [—a/2,— a/2 + 
B] U [1 + a/2, 1 + a/2 + B] called the 'exterior'. 

Interior target regions: Divide the interior into intervals 
(target regions) of width D. The key observation is that if 
D < a — B, regardless of the value of S, each interior 
target region is fully accessible with an average of a/D write 
attempts with a fixed input. As illustrated in the bottom part of 
Figure |4] to access the interior target region [x, x + D], apply 
the stimulus (x + D — a/2) + . To fully access the region with 
offset b, we need 

(x + D- a/2)+ + b - a/2 < x 



which holds for all be [0,B] as long as D < a — B. 

Exterior target regions: As shown in Figure |4] define 2m ex- 
terior target regions for an integer m > 1. For i = 1, . . . , 2m, 
the ith exterior target region, labeled Ei, is 



a , B a 

h (i- 1) — , — 

2 V '2m 2 
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a „ B 
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With this construction, we present a coding scheme that 
achieves the following lower bound on the rewrite capacity. 
Theorem 2: For k > 2 



C(k) > max h(p) + plog 

p,D,rn 



1 



a-B 
D 



+ (1 — p) log 2m 



where the maximum is over p £ [0, 1], D £ (0, a — B) and 
integers m > 1 that satisfy 



2m + 1 + 



1 



m 



In 



(i-<y 2 



•In* 



P D~ K 



(22) 



where the optimal Si £ [0, 1) for i = 1,2, ... , m is determined 
by the following equation: 

2(l-5 i ) 2 + 3(i-l)(l-*) + (i-*)ln*=0. (23) 

The optimal Si for a few values of i are listed in Table [I] 

Remark: The Si in the above theorem can chosen to be 
arbitrary values in [0,1). Picking Si that satisfy ( (23] l minimizes 
the number of rewrites, given by the left side of ( p2| ). For 
example, ( |22| i can be replaced by a simpler condition obtained 
by setting Si — for all i: 

1U "' ! ' (24) 



a .a 

V — I- (i - p) — 

F D B 



2m + 1 + 



< K. 



The proof of the theorem is given in the next section. Figure 
[5] shows the lower bound of Theorem [2] with a = 1/3 and 
B = a/2 for various values of k. 



We now show that the above lower bound converges to the 
no-state capacity as the rewrite constraint n — > oo. 

Corollary 1: The rate R(n) achieved by Theorem[2]satisfies 



TABLE I: Optimal value of 5j for 1 < i < m 



log 



- R(k) 



— » as k — > oo 



Proof: Choose D = a/n and m = Mj = Note that 
for all B < a, D = a/n < (a — B) for sufficiently large 
K. With this choice and setting 5i — for all i, the average 
number of rewrites given by the left-side of (|24|) becomes 



P«+(1-P)o 2 



1 

m 



mm! 



where e K = 0(- 5g -^) goes to as k 
we see that 

l+a 



= k(1 + e K ), 

oo. Then with p = 



R(k(1 + e„)) = log 
Therefore J?(«) = log 



1 



l+e„ 

Remarks : 

1) The coding scheme for the uniform noise channel (de- 
scribed in Section |IV| i stores information cell-by-cell, 
and therefore has low computational complexity. In con- 
trast, each codeword in the Gelfand-Pinsker scheme of 
Section |II] is defined over a large array of n cells, which 
makes the encoding and decoding computationally hard. 

2) All results in this section generalize to the case where 
the hidden state uniformly distributed over a different 
support set of width B that is different from [0, B}. 

3) The coding scheme of construction 1 can directly be 
used when B > a. Construction 2 needs modification - 
the interior target region will not completely lie within 
the noise support when B > a. This can be addressed 
by shifting the input stimulus by an appropriate amount 
if it is determined that the offset b > a. This is similar in 
principle to the switching strategy used for the exterior 
regions. 

IV. Proof of Theorem|2] 

To highlight the main ideas, we start with a simplified 
coding scheme for the case of m = 1, i.e., two exterior target 
regions. 

A. Two Exterior Target Regions 

Coding Scheme: Fix p e [0, 1]. For each cell, an interior 
target region is picked with probability p and an exterior region 
is picked with probability (1 — p). All interior target regions 
are equally likely, as are the exterior regions. Formally, each 
interior region has probability p 1+ ®_ B and the two exterior 
regions each have probability (1 —p)/2. Refer Figure [6] To 
write on interior region [x,x + D], repeatedly apply stimulus 
(x + D — a/2) + until the output falls within the region. 

To write on exterior region E\ : Apply stimulus 1 until either 
the output falls in (1 + a/2, 1 + a/2 + B/2), or it is less than 
1 — a/2 + B/2. If the former occurs, stop. Otherwise, apply 



i 


1 


2 


3 


4 


5 


6 


Si 


0.2032 


0.1038 


0.0858 


0.0782 


0.0740 


0.0713 



stimulus until the output falls in (—a/2, —a/2 + B/2). The 
intuition is that the right bin of E\ is fully accessible with 
stimulus 1 if the offset lies in the interval [B/2, 1]. We switch 
to the left bin of Ei if we detect that the offset lies outside 
this interval. 

To write on exterior region E 2 : Apply stimulus until either 
the output falls in (—a/2 + B/2, —a/2 + B) or it is greater 
than a/2 + B/2. If the former occurs, stop. Otherwise, switch 
to applying stimulus 1 until the output falls in (1 + a/2 + 
B/2, 1 + a/2 + B). If the offset lies in the interval [0, B/2], 
the left bin of E 2 is fully accessible with stimulus 0. We switch 
to the right bin of E 2 if we detect that the offset lies outside 
this interval. 

Analysis: Since we have two exterior target regions with 
probability (1— p)/2, and [ 1+ ^~' B J interior regions each with 
probability p/[ 1+a D ~ B J, the rate of information stored in each 
cell is calculated to be 

H(T) = h{p)+p\og[ l + a D B \ + (1 -p) log 2. (25) 

Next we compute the average number of writes and set it 
equal to k. 

K = pE[| writes | interior] + (1 — p)E[# writes | exterior] 

a 



Pjj + (1 ~ p)E[# writes | exterior] 



(26) 



By symmetry, 

E[# writes | exterior] = E[# writes | ext. region Ei] 

= { E[# writes \E l ,S = b]—db 
Jo B 

= / E[# writes \E l ,S = b]—db + [ 
Jo B Ji 



a 1 



Ib B/2B 



db 



(27) 



since the right bin of E\ is fully accessible with stimulus 1 
when S G [B/2, B). We now show that for all 6 G [0, B/2), 



E[# writes \E 1 ,S = b]= 4a/ B. 



(28) 



Recall that for E\, we first apply stimulus 1 until either the 
output falls in either (l + a/2, 1 + a/2 + B /2) or it is less than 
1 - a/2 + B/2. For b G [0, B/2), the probability of the first 
event occurring in any write attempt is b/a, and that of the 
second event occurring is (B/2 — b)/a. Hence the probability 
of the first step being completed in each write attempt, is 



b B/2 -b B 

- H = — . 

a a 2a 



(29) 



Therefore the average number of writes for the first step of 
Ex is 2a /B for all b G [0,-8/2). The probability of the first 



- Interior - 




6 7 8 9 
Average number of writes 

Fig. 5: Achievable rate of Theorem [2] with noise width a = 1/3 and 
S uniformly distributed in [0, B] with B = a/2. 

step ending by obtaining an output less than 1 — a/2 + B/2 
is 

(B/2-b)/a _B/2-b 
b/a+(B/2 - b)/a ~ B/2 ' 

When this event occurs, the average number of additional 
writes required (by applying stimulus 0) is a/(B/2 — b). Thus 
for b € [0, B/2), the average number of writes for writing on 

E x is 

2a B/2 -b a _ 4a 



E[# writes \E u S = b] = 



B 



B/2 B/2-b 



Substituting in ( |27| i, we obtain 
E[# writes | exterior] ; 
Using this in d26]i, we get 



4a 1 

~B 2 



2a 1 

~B 2 



3 a 
~B' 



a 3a 

K = p— + (1 — p)- 



B ' 

(30) 



(31) 



(32) 



b ' ln T^ + T^ ln ^ < 33 > 



D 1 ' B 

which corresponds to < [22] > with to = 1 and <5i = 0. We now 
modify the scheme slightly to reduce the average number of 
rewrites to the level stated in Theorem [2j 

a / 3a 1 S\ 

K=p— + (l-p) 

with Si given by Table [I] 

Optimizing the Switching Strategy: To write on exterior 
region 1 in Figure [6] the above coding scheme switches from 
stimulus 1 to stimulus when an output less than 1 — | + 
is obtained. Such an output indicates that the value of the 
hidden state S is less than ^ which implies that the right bin 
of Ei - the region [1 + |, 1 + | + ~] - is not fully accessible 
with stimulus 1; so the schemes switches to targeting the left 
bin of Ei with stimulus 0. This switching strategy is not 
optimal. Consider a more general switching strategy of the 
following form: switch from stimulus 1 to once you obtain 
an output less than 1 — § + 1^(1 — 01) for some Si £ [0, 1). 
This corresponds to switching once you detect that S is less 
than B[ \ Sl) . We now determine the optimum value of r5i 



E x E 2 



Ei 



-j-|+f-|+B x t+D 1+1 1+l+B 

Fig. 6: Construction 2 with two exterior target regions. 

that minimizes the average number of rewrites. By symmetry, 
the switching strategy for exterior regions E2 is to switch 
from stimulus to 1 when you get an output greater than 

§ + 1(1+0!). 

The average number of rewrites for region E\ is 

E[# writes | region E{\= E[# writes | Ei,S = b]—db 

Jo B 

(34) 

where E[# writes | Ei,S — b] with the new switching 
strategy can be calculated to be 

2ma/B for f <b<B, 

a/b for f (l-<5i) < b< f , 

^(l + ii^lf^) for0< O <f(l-^). 

(35) 

Substituting this in (|34|i and calculating the integral, we obtain 



E[# writes | Ei] = ~ ( 3 + In — ^ + ln^ 
B \ 1 — Si 1 — di 

Using ((36| in (|26]l completes the proof for m = 1. 



(36) 



B. 2m Exterior Target Regions 

Coding Scheme: Refer Figure |4] Writing on the interior 
regions is the same as before: For interior region [x,x + D], 
repeatedly apply stimulus (x+D — a/2) + . To write on exterior 
region i, 1 < i < 2m: 

• If 1 < i < m, write 1 until the output falls in region Ei 



or it is less than 1 — 



a j_ B(i-6j) 
2m 



In the first case, stop. 



In the second case, switch to writing until the output 
falls in the left bin of region i. 

If to + 1 < i < 2m, write until the output falls in region 



Ei or it is greater than § 



B{i-l+S 2m +i-i) 



. In the first 



2 1 2m 

case, stop. In the second case, switch to writing 1 until 
the output falls in the right bin of region i. 

Analysis: The rate calculation is straightforward, the only 
change from the previous subsection being that each of the 
exterior target regions now represents log 2to bits of informa- 
tion. The average number of writes for an interior target region 
is a/D. For an exterior region, we calculate it separately for 
each Ei, i = 1, . . . , m as follows. Note that by symmetry, Ei 
and E-2m+i-i have the same average cost. We have 

r B 1 

E[# writes | region E z ] = E[# writes | E % , S = b] — db 

Jo B 

(37) 



where E[# writes | E U S 

2ma/B 



2mb-(i-l)B 



2ma 



i-8i)B-2mb 



b] can be calculated to be 

for ^ < b < B, 

2m — — 5 



f0I m^i <b< m 

2m — 2m ' 



(i-8j)B-2mb^ 
iB-2mb , 



2ma 
B 



for 



B('-l) 



<b< 



B(i-Sj) 
2m ' 



forO< b< 

— 2m 



(38) 

Using this in ( |34| > and calculating the integral, we obtain that 

for 1 < i < m: 



E[# writes | E t ] = | ^2m + 1 + In *_ * 



5^ 



Si Si 



(39) 

The average number of write attempts for an exterior region 
is therefore 



a 
B 



2m + 1 



_1_ / i - Sj 
m ^ \ n (1 - &) 2 



ln- 
1 - S t n 5« 



For each i E {1, . . . , m}, it is easily verified that the Si £ [0, 1) 
that minimizes ([39]l satisfies (|23|l. This completes the proof. 



V. Conclusion 

In a channel with unknown parameters (modeled by a 
hidden state), rewrites increase the capacity in two ways: 1) 
by mitigating the effect of write noise, and 2) by enabling 
the write controller to get progressively better estimates of 
the state. For the uniform noise channel, one of the key 
observations was that the hidden state does not affect coding 
in the interior region. This idea could be generalized to other 
channels where the output has bounded support. 

There are many open questions to be explored. One is 
obtaining a capacity upper bound, which is challenging as 
we need to consider all adaptive input strategies. The general 
capacity lower bound can be improved via a scheme that 
does simultaneous coding and estimation; the challenge here 
lies in analyzing such a coding scheme to get a computable 
expression for the achieved rate. Another goal is to modify 
the superposition scheme so that it robust to small amounts of 
read noise. As discussed at the end of Section |TTJ the current 
scheme requires the reads to be highly accurate. 

The channel model analyzed here is motivated by non- 
volatile memories such as Phase Change Memory and Re- 
sistive RAM. The coding schemes illustrate how information- 
theoretic techniques like superposition can be used to increase 
the storage density. Though the schemes presented are for 
analog storage channels, the ideas can be extended to finite 
alphabet channels which arise in technologies such as Mag- 
netic RAM | [23] . A more sophisticated channel model for real 
memories is one where the value written on the cell depends 
on the stimulus as well as the previous value stored in the 
cell. Another interesting possibility is extending the model to 
account for stochastic variation of the cell contents over time, a 
phenomenon which is encountered in most memory technolo- 
gies and which manifests as a read noise at the "receiver". 



We believe that developing efficient rewritable schemes for 
such realistic models will have a significant impact on several 
non-volatile memory technologies. 

Appendix 

Proof of Theorem [JJ 

Fix an estimation period / e {0, . . . , [_« — lj }, an estimator 
S(l), a distribution fj^mi and a function / to generate the 
channel input X = f(U, S(l) ). This defines a joint distribution 
of (S, S(l), U, X, Y) in the set V. 

Construct a codebook consisting of 2 nRl codewords, whose 
elements are picked i.i.d. according to Pjj, the marginal 
distribution of the auxiliary random variable U. This codebook 
is partitioned in 2 nR bins where 



R < I(U;Y)-I(U;§(1)). 



(40) 



Ri > R will be specified later. 

The output space of each cell is divided into [k — l\ target 
regions, as described in Section [Il-A| (see Figure [T}. 

Encoding: The message to be stored in the n-cell array 
consists of two parts (mi, mi), where m\ g {1, . . . ,2 nR } 
and m 2 G {1, ...,([_« — l\) n }- Let S(Z) be the state estimate 
obtained using the first I writes. To encode the first part of 
the message, we choose a codeword U from the mith bin 
such that the pair (U, S(Z)) is jointly typical |20j Section 8.2] 
according to the distribution described by the following joint 
density: 



P, 



£/,S(0 v 



(u,s)= I P s (s)P §ms (s\s)P m§m (u\s)ds. (41) 



From rate-distortion theory [20], such a codeword U can be 
found with high probability if 



R!-R> I(U;S(l)). 



(42) 



( |42| ) gives a lower bound on the minimum number of code- 
words in each bin (2 n ( Rl ~ RS >) required for successful encod- 
ing. The input stimulus X = {^Q}™ =1 is generated symbol by 
symbol as X, = /([/,, 5,(0). 

The second part of the message is conveyed through super- 
position coding. For cell i, apply stimulus Xj until the output 
falls in the appropriate target region. For any realization of the 
input stimulus and state, the output is equally likely to fall in 
each of the target regions with probability 



L«-/j 



Hence the 



average number of writes required after the estimation period 
is \_k — l\ , and the average total writes per cell is I + \ k — l\ . 

Decoding: The decoder attempts to find a codeword U that 
is jointly typical with the stored sequence Y according to 
( |4Tj ). If there is a unique such codeword, its bin is decoded 
as the message m\. The target region containing the output 
of each cell gives the message m-i- The codeword U can be 
successfully decoded if the rate of the codebook satisfies 



R!<I(U;Y). 



(43) 



Combining ( |42| i and ( |43) , we conclude that U can be success- 
fully encoded and decoded if (Hob is satisfied. 



We have thus shown that a total of R + log [k — l\ bits/cell 
can be reliably stored and decoded with average write cost 
I + [k ~ l\ as long as R satisfies ( ftp) . 
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